Skip to content

fix(0.2.1): wire request context + dedup in-flight server analyses#132

Open
pmclSF wants to merge 1 commit intomainfrom
fix/0.2.1-server-context-singleflight
Open

fix(0.2.1): wire request context + dedup in-flight server analyses#132
pmclSF wants to merge 1 commit intomainfrom
fix/0.2.1-server-context-singleflight

Conversation

@pmclSF
Copy link
Copy Markdown
Owner

@pmclSF pmclSF commented May 2, 2026

Summary

Closes the two server defects flagged in the 0.2.0 launch-readiness review. The CHANGELOG block under "Known issues tracked for 0.2.1" can be removed in a follow-up once this lands.

Defect 1 — request context not wired. Handlers ignored r.Context() and called engine.RunPipeline (which wraps context.Background()). Client disconnects left the analysis running in the background.

Defect 2 — mutex-blocking cache. getResult held Server.mu via defer Unlock() for the full analysis duration. One slow analysis serialized every other pipeline-needing request behind it.

Approach

Both fixes share one structural change:

  • Fast path: sync.RWMutex.RLock — warm-cache hits no longer contend with each other or with writers.
  • Slow path: singleflight.Group.DoChan — concurrent callers wait on a single in-flight analysis (one analysis per cache window, even under load).
  • Per-caller cancellation: each handler threads r.Context() through getResult; the select on ch | ctx.Done() returns ctx.Err() immediately on disconnect. The shared analysis runs with context.Background() so a single caller's disconnect doesn't cancel work other waiters depend on.

Tradeoff

A single-waiter request whose context is canceled won't (yet) cancel the underlying analysis — only the handler returns. Reference-counting waiters and canceling when none remain is on the 0.3 list, documented inline in server.go.

Test plan

  • TestGetResult_CacheHit — fast path returns the cached pointer
  • TestGetResult_RespectsCanceledContext — pre-canceled context returns context.Canceled within 2s (pre-fix: hung until analysis completed)
  • TestGetResult_ConcurrentCallsShareCache — 50 concurrent callers on warm cache observe the same report pointer
  • go build ./..., go test ./..., go vet ./... clean

Dependency

golang.org/x/sync v0.10.0 for singleflight. Pinned to v0.10.0 because v0.20.0 bumps the go directive to 1.25; this is compatible with the existing go 1.23.

🤖 Generated with Claude Code

Two related defects in `terrain serve`, both flagged by the launch-
readiness review and verified in the codebase:

1. **Request context not wired.** The HTTP handlers ignored
   r.Context() and called engine.RunPipeline (which wraps
   context.Background()), so a client disconnect during a long
   analysis left the analysis running in the background with no
   handler waiting on it.

2. **Mutex-blocking analysis cache.** getResult held Server.mu via
   defer-Unlock for the full analysis duration. One slow analysis
   serialized every other request needing a pipeline result behind a
   single goroutine, regardless of whether the cache was warm enough
   for them.

Both ship together because the right fix to (2) replaces the cache
mutex with an RWMutex + singleflight, which also gives us a clean
seam for (1):

  * Fast path: RWMutex.RLock so warm-cache hits don't contend.
  * Slow path: singleflight.Group.DoChan dedups concurrent in-flight
    analyses (one analysis per cache window, even with N waiters).
  * Per-caller cancellation: each handler threads r.Context() through
    getResult; the select on (ch | ctx.Done()) returns ctx.Err()
    immediately on disconnect. The shared analysis runs with
    context.Background() so a single caller's disconnect doesn't kill
    work other waiters depend on.

Tradeoff documented inline: a single-waiter request whose context is
canceled won't (yet) cancel the underlying analysis. Reference-
counting waiters is on the 0.3 list.

Tests added:
  * TestGetResult_CacheHit — fast path returns cached pointer
  * TestGetResult_RespectsCanceledContext — pre-canceled context
    returns context.Canceled within 2s rather than blocking on
    analysis (pre-fix this hung until pipeline completion)
  * TestGetResult_ConcurrentCallsShareCache — 50 concurrent callers
    on a warm cache observe the same report pointer

Dep: golang.org/x/sync v0.10.0 (singleflight). Pinned to v0.10.0
because v0.20.0 bumps the go directive to 1.25; v0.10.0 is
compatible with the existing go 1.23.

Server-package doc-comment refreshed to describe the new
concurrency model and remove the now-fixed "known issues" block
that was added in PR #131.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

[RISK] Terrain — Merge with caution

High-severity gaps found in changed code.

Metric Value
Changed files 5 (2 source · 1 test)
Impacted units 8
Protection gaps 2
Tests selected 1 of 772 (0% of suite)

Coverage gaps in changed code

  • internal/server/handlers.go [LOW] — handlers.go has no observed test coverage.
    → Add unit tests for handlers.go.
  • internal/server/server.go [MED] — Exported function ListenAndServe has no observed test coverage.
    → Add unit tests for exported function ListenAndServe — this is public API surface.

Pre-existing issues (3)

  • internal/server/handlers.go [HIGH] — [blastRadiusHotspot] Changes to this file propagate to 351 tests (14 direct, 337 indirect). High blast radius increases regression risk.
  • internal/server/server.go [HIGH] — [blastRadiusHotspot] Changes to this file propagate to 351 tests (14 direct, 337 indirect). High blast radius increases regression risk.
  • internal/server/server_test.go [MED] — [fixtureFragilityHotspot] Fixture 'newServerWithCachedReport' is used by 13 tests across 1 files. A single change cascades widely.

Recommended tests

1 test(s) with exact coverage of 6 impacted unit(s). 2 impacted unit(s) have no covering tests in the selected set.

Test Confidence Why
internal/server/server_test.go exact exact coverage of Config, DefaultHost, DefaultPort + 3 more

Owners: PMCLSF

Limitations
  • No coverage artifacts provided; protection gaps reflect missing data, not measured absence. Provide --coverage to improve accuracy.
  • Mixed test cultures reduce cross-framework optimization confidence. Consider standardizing on fewer frameworks.

Generated by Terrain · terrain pr --json for machine-readable output

Targeted Test Results

Terrain selected 1 test(s) instead of the full suite.

  • Go tests: passed

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

Terrain AI Risk Review

Metric Value
AI surfaces 13
Eval scenarios 16
Impacted scenarios 0
Uncovered surfaces 13

Decision: PASS — AI surfaces are covered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant